Workshop Session at 2025 Rawley Conference
November 6, 2025
Assistant Professor, Digital Scholarship & Sciences Librarian, Research Partnerships, University Libraries, University of Nebraska–Lincoln (August 2025–)
topicmodels for LDA & stm for Structural Topic ModelingParliament’s Debates about Infrastructure (Guldi, 2019)
The Dangerous Art of Text Mining (Guldi, 2023)
[W]ithout help from the humanities, data science can distort the past and lead to perilous errors.
| Feature / Method | LDA (Blei et al., 2003) | BERTopic (Grootendorst, 2022) | TopicGPT (Pham et al., 2024) |
|---|---|---|---|
| Approach | Probabilistic topic model (Bayesian) | Transformer-based embeddings + clustering + class-based TF-IDF | LLM / GPT-based prompting and summarization |
| Best for | Longer text (hundreds of words per document) | Short text (tweets, headlines), mixed-length corpora | Flexible lengths |
| Input Representation | Bag of words | Dense semantic embeddings | Natural-language prompts |
| Output | Topic-word distributions; document-topic probabilities | Human-readable topic labels automatically; cluster assignments | Natural-language topic labels; coherent summaries |
| Interpretability | Requires human reading of top words | Usually more coherent topics | Often most interpretable |
| Computational Cost | Low–moderate | Moderate; depends on embedding model | High; relies on LLM inference |
| Libraries / Tools | topicmodels, stm (R), gensim, scikit-learn (Python), MALLET |
bertopic (Python) |
custom LLM pipelines |
| Parameter Tuning | Choose # of topics k | HDBSCAN clustering parameters, dimensionality reduction | Prompt templates; LLM model choice |
| Advantages | Proven, transparent, reproducible, lightweight | Works well for short texts; coherent topics; automatic naming | Extremely interpretable; strong for messy or domain-specific corpora |
| Limitations | Bag-of-words misses semantics; topics can be vague | Can struggle with very small corpora or noisy embeddings | Expensive, proprietary models, reproducibility issues |
stmMain Activities
Light refreshments will be provided. | More details: https://unl.libguides.com/unlgisday
Pei-Ying Chen | pchen12@unl.edu | LLS 218D
Last update: November 9, 2025